problem accuracy
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Sun, Zhiqing, Yu, Longhui, Shen, Yikang, Liu, Weiyang, Yang, Yiming, Welleck, Sean, Gan, Chuang
Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans? This paper answers this question in the context of tackling hard reasoning tasks (e.g., level 4-5 MATH problems) via learning from human annotations on easier tasks (e.g., level 1-3 MATH problems), which we term as \textit{easy-to-hard generalization}. Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. Based on this insight, we propose a novel approach to scalable alignment, which firstly trains the process-supervised reward models on easy problems (e.g., level 1-3), and then uses them to evaluate the performance of policy models on hard problems. We show that such \textit{easy-to-hard generalization from evaluators} can enable \textit{easy-to-hard generalizations in generators} either through re-ranking or reinforcement learning (RL). Notably, our process-supervised 7b RL model achieves an accuracy of 34.0\% on MATH500, despite only using human supervision on easy problems. Our approach suggests a promising path toward AI systems that advance beyond the frontier of human supervision.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
ACRE: Abstract Causal REasoning Beyond Covariation
Zhang, Chi, Jia, Baoxiong, Edmonds, Mark, Zhu, Song-Chun, Zhu, Yixin
Causal induction, i.e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data. Humans, even young toddlers, can induce causal relationships surprisingly well in various settings despite its notorious difficulty. However, in contrast to the commonplace trait of human cognition is the lack of a diagnostic benchmark to measure causal induction for modern Artificial Intelligence (AI) systems. Therefore, in this work, we introduce the Abstract Causal REasoning (ACRE) dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario: direct, indirect, screening-off, and backward-blocking, intentionally going beyond the simple strategy of inducing causal relationships by covariation. By analyzing visual reasoning architectures on this testbed, we notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning. These deficiencies call for future research in models with a more comprehensive capability of causal induction.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Mississippi (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.74)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
- (2 more...)
Time Series Classification via Topological Data Analysis
Karan, Alperen, Kaygun, Atabey
In this study, we use persistent homology to perform classification tasks on two publicly available multivariate time series datasets [19, 11] that include physiological data collected during stressful and non stressful tasks. Instead of directly computing signal-specific features from sliding windows and subwindows on modalities such as electrocardiogram and wrist temperature (Figure 7), we extracted features using persistence diagrams and their statistical properties. Subwindowing method allowed us to reduce noise without incurring an extra computational cost. We then developed machine learning models and assess the performance of our models by varying window sizes and using different flavors of persistence diagrams. Topological Data Analysis (TDA) techniques usually work with points embedded in an affine space of large enough dimension. However, TDA techniques can still be applied to time series data sets whether they are univariate or multivariate. One can convert a univariate time series into a finite collection of points in a -dimensional affine space using delay embedding methods, of which one can compute persistent homology. Since Taken's Theorem implies that the delay embeddings produces topologically invariant subsets on a non-chaotical dynamical system [21], one can reasonably expect that persistent homology produces features that would distinguish different time series. There is a handful of research on the persistent homology of delay embeddings for time series classification [23, 20, 1].